In data mining, association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. Piatetsky-Shapiro[1] describes analyzing and presenting strong rules discovered in databases using different measures of interestingness. Based on the concept of strong rules, Agrawal et al.[2] introduced association rules for discovering regularities between products in large scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements. In addition to the above example from market basket analysis association rules are employed today in many application areas including Web usage mining, intrusion detection and bioinformatics.
Contents |
transaction ID | milk | bread | butter | beer |
---|---|---|---|---|
1 | 1 | 1 | 0 | 0 |
2 | 0 | 0 | 1 | 0 |
3 | 0 | 0 | 0 | 1 |
4 | 1 | 1 | 1 | 0 |
5 | 0 | 1 | 0 | 0 |
Following the original definition by Agrawal et al.[2] the problem of association rule mining is defined as: Let be a set of binary attributes called items. Let be a set of transactions called the database. Each transaction in has a unique transaction ID and contains a subset of the items in . A rule is defined as an implication of the form where and . The sets of items (for short itemsets) and are called antecedent (left-hand-side or LHS) and consequent (right-hand-side or RHS) of the rule respectively.
To illustrate the concepts, we use a small example from the supermarket domain. The set of items is and a small database containing the items (1 codes presence and 0 absence of an item in a transaction) is shown in the table to the right. An example rule for the supermarket could be meaning that if butter and bread is bought, customers also buy milk.
Note: this example is extremely small. In practical applications, a rule needs a support of several hundred transactions before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
To select interesting rules from the set of all possible rules, constraints on various measures of significance and interest can be used. The best-known constraints are minimum thresholds on support and confidence.
Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Association rule generation is usually split up into two separate steps:
While the second step is straight forward, the first step needs more attention.
Finding all frequent itemsets in a database is difficult since it involves searching all possible itemsets (item combinations). The set of possible itemsets is the power set over and has size (excluding the empty set which is not a valid itemset). Although the size of the powerset grows exponentially in the number of items in , efficient search is possible using the downward-closure property of support[2][4] (also called anti-monotonicity[5]) which guarantees that for a frequent itemset, all its subsets are also frequent and thus for an infrequent itemset, all its supersets must also be infrequent. Exploiting this property, efficient algorithms (e.g., Apriori[6] and Eclat[7]) can find all frequent itemsets.
The concept of association rules was popularised particularly due to the 1993 article of Agrawal,[2] which has acquired more than 6000 citations according to Google Scholar, as of March 2008, and is thus one of the most cited papers in the Data Mining field. However, it is possible that what is now called "association rules" is similar to what appears in the 1966 paper[8] on GUHA, a general data mining method developed by Petr Hájek et al.[9]
Next to confidence also other measures of interestingness for rules were proposed. Some popular measures are:
A definition of these measures can be found here. Several more measures are presented and compared by Tan et al.[14] Looking for techniques that can model what the user has known (and using this models as interestingness measures) is currently an active research trend under the name of "Subjective Interestingness"
One limitation of the standard approach to discovering associations is that by searching massive numbers of possible associations to look for collections of items that appear to be associated, there is a large risk of finding many spurious associations. These are collections of items that co-occur with unexpected frequency in the data, but only do so by chance. For example, suppose we are considering a collection of 10,000 items and looking for rules containing two items in the left-hand-side and 1 item in the right-hand-side. There are approximately 1,000,000,000,000 such rules. If we apply a statistical test for independence with a significance level of 0.05 it means there is only a 5% chance of accepting a rule if there is no association. If we assume there are no associations, we should nonetheless expect to find 50,000,000,000 rules. Statistically sound association discovery[15][16] controls this risk, in most cases reducing the risk of finding any spurious associations to a user-specified significance level.
Many algorithms for generating association rules were presented over time.
Some well known algorithms are Apriori, Eclat and FP-Growth, but they only do half the job, since they are algorithms for mining frequent itemsets. Another step needs to be done after to generate rules from frequent itemsets found in a database.
Apriori[6] is the best-known algorithm to mine association rules. It uses a breadth-first search strategy to counting the support of itemsets and uses a candidate generation function which exploits the downward closure property of support.
Eclat[7] is a depth-first search algorithm using set intersection.
FP-growth (frequent pattern growth)[17] uses an extended prefix-tree (FP-tree) structure to store the database in a compressed form. FP-growth adopts a divide-and-conquer approach to decompose both the mining tasks and the databases. It uses a pattern fragment growth method to avoid the costly process of candidate generation and testing used by Apriori.
GUHA is a general method for exploratory data analysis that has theoretical foundations in observational calculi.[18] The ASSOC procedure[19] is a GUHA method which mines for generalized association rules using fast bitstrings operations. The association rules mined by this method are more general than those output by apriori, for example "items" can be connected both with conjunction and disjunctions and the relation between antecedent and consequent of the rule is not restricted to setting minimum support and confidence as in apriori: an arbitrary combination of supported interest measures can be used.
OPUS is an efficient algorithm for rule discovery that, in contrast to most alternatives, does not require either monotone or anti-monotone constraints such as minimum support.[20] Initially used to find rules for a fixed consequent[20][21] it has subsequently been extended to find rules with any item as a consequent.[22] OPUS search is the core technology in the popular Magnum Opus association discovery system.
A famous story about association rule mining is the "beer and diaper" story. A purported survey of behavior of supermarket shoppers discovered that customers (presumably young men) who buy diapers tend also to buy beer. This anecdote became popular as an example of how unexpected association rules might be found from everyday data. There are varying opinions as to how much of the story is true.[23] Daniel Powers says[23]
In 1992, Thomas Blischok, manager of a retail consulting group at Teradata, and his staff prepared an analysis of 1.2 million market baskets from about 25 Osco Drug stores. Database queries were developed to identify affinities. The analysis "did discover that between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco managers did NOT exploit the beer and diapers relationship by moving the products closer together on the shelves.
Contrast set learning is a form of associative learning. Contrast set learners use rules that differ meaningfully in their distribution across subsets.[24]
Weighted class learning is another form of associative learning in which weight may be assigned to classes to give focus to a particular issue of concern for the consumer of the data mining results.
K-optimal pattern discovery provides an alternative to the standard approach to association rule learning that requires that each pattern appear frequently in the data.
Mining frequent sequences uses support to find sequences in temporal data.[25]
Generalized Association Rules hierarchical taxonomy (concept hierarchy)
Quantitiative Association Rules categorical and quantitative data
Interval Data Association Rules e.g. partition the age into 5-year-increment ranged
Maximal Association Rules
Sequential Association Rules temporal data e.g. first buy computer, then CD Roms, then a webcam.